may not be a constant.
d on the principle of the kernel method, the principle of using a
matrix to quantify how similar two sequences are, which is used
nce homology alignment can be considered to measure the
y between peptides as well [Holm and Sander, 1996]. For
to align two protein sequences, an amino acid mutation matrix
the Dayhoff matrix [Dayhoff and Schwartz, 1978] has been
ed [Lipman, et al., 1989]. This idea has led to the development of
ithms which employ a mutation matrix for protease cleavage
scovery.
tation matrix is normally derived based on a large number of
equences. A substitution entry in a mutation matrix actually
s the mutation rate between two amino acids. A mutation matrix
only derived through investigating thousands of sequences within
cal family. This means that a mutation matrix is perhaps more
lly sound and robust to measure the similarity between peptides.
are three factor Xa protease cleaved peptides, x1=IEGRT,
I and x3=IEGRD. When using the binary encoding approach, a
ector for each amino acid in a peptide has only one entry set as
ing other 19 entries as zeros. Therefore, the pairwise distance
a pair of peptides actually depends on the difference of residues
peptides. The pairwise distance between these three sub-
s is always one because only one residue is occupied by different
ids. However, the pairwise homology score (similarity) between
e sub-sequences will not be a constant. Table 3.10 shows a partial
amino acid mutation matrix. Suppose this mutation matrix is used
re the pair-wise similarity between these three sub-sequences.
al similarity between ݔଵ and ݔଶ for the fifth residue (between the
id T and the amino acid I) is −2. The partial similarity between
ଷ is −4. The partial similarity between ݔଶ and ݔଷ for the fifth
between the amino acid T and the amino acid D) is −6. It can be
they are not constants at all. This shows a very important concept
binary encoding approach may not be able to reflect the true
l relationship between peptides.